Robust Word Recognition

Zhao (Leo) Cheng, Benjamin Walker

Summary

This article proposes the objective, methodology and discussions in the area of word recognition for completing the term project.

Objective

This project is to build word recognition software which processes words of varying quality. The word recognition problem can be modeled as a classification process, recognizing the word by comparing each character with a trained classifier. This is a very good problem which builds machine learning skills through hands-on experience in supervised/unsupervised learning, and some other feature extracting techniques used in character classification. Similar projects (Spring 2012) in optical word and handwritten character recognition have shown satisfying performance in achieving high accuracy with high quality testing data [1, 2]. This project will apply different algorithms in feature extraction and classification, and train these algorithms with a more challenging dataset.

In recent decades great progress has been made in character recognition. There are many commercialized Optical Character Recognition (OCR) and hand writing recognition packages which provide accurate and high speed results on good quality input. However, these packages do not work well on poorly written or low quality documents. Human beings do this type of work very well based on two factors, 1) exceptional capability to separate the character with its complex background 2) a preliminary knowledge in understanding the words make it possible to predict the word with partial or incomplete information. In this project, we plan to explore some techniques inspired by human capability to improve the accuracy of word recognition algorithms.

Methodology

The process of word recognition has been partitioned into five tasks, which include image preprocessing, word segmentation, feature extraction/classifier training, classifier design, and performance assessment. In the phase of image preprocessing, an unsurprised clustering algorithm may be used to separate the word from its background. In order to segment the words into individual characters, a region connection algorithm will be implemented. We are planning to try multiple classification techniques, including decision tree, k-nearest neighbor (KNN), Support Vector Machine (SVM) and Artificial Neural network (ANN). Finally, the 10-folder cross validation technique will be applied on training dataset to select the best model, and the performance will be assessed by validation of the test dataset.

Resource

This project requires two persons with adequate knowledge of Matlab coding and machine learning techniques.

The training data and test data will be obtained from [4]. Each dataset is provided as a zip file, and contains a set of JPEG images of single words and an XML tag file. In the ICDAR 2003 dataset, the training dataset contains labeled 1157 words and the testing dataset is has 1111 words [3]. More data set with similar format can be obtained from [5-7].

Timeline

The project has been partitioned into five tasks, each of which is evaluated and assigned,

Tasks	Content	Workload (week)
Image Preprocessing	Read data into image matrices from image files Sharpen the image when necessary Separate the word from its background by checking color histogram	0.5
Word Segmentation	Segment the words based on region connection Cut the image into characters	0.5
Feature Extraction & Classifier Training	Apply a feature extraction technique to train a classifier Learn algorithms with the following classifiers	1.5
Classifier Design	Decision Tree k-nearest Neighbor (KNN) Support Vector Machine (SVM) Artificial Neural Network (ANN)	2 (0.5x4)
Performance Assessment	k-fold cross validation	0.5
Documentation & Presentation	Milestone report, Final report Presentation	1

We are expecting to finish the first three tasks by milestone deadline. In the milestone report, a detailed report on techniques of image preprocessing, word segmentation and feature extraction will be presented. We will also test the word recognition and get some results.

Discussion

Figure 1 shows the ambiguity in the letter recognition problem. If we look from the top-down direction of the left five words, a sequence of letters (A, B, C) will come into human’s mind. However, it will be a set of numbers (12, 13, 14) if we look from left to the right. Recognition is a very hard problem, and trying to do it without considering the context might result in a classification disaster. To figure this out, we might need a dictionary or to build a dictionary by some kind of learning techniques. Once we have the context, the classification rate might improve dramatically. By checking the bottom right word in Figure 1, a classifier would get the result “CQDE”, however this makes no sense for human beings. The most possible reason is the written error of the letter “O”. An auto-correction may be made with the context to help the machine understanding (recognizing) words instead of isolated characters.

Figure 1 Word Recognition Example

References

[1] Jonathan Connell and Vijay Kothari, “Handwritten Character Recognition”, CS 74/174 - Spring 2012, final report of term project

[2] Yuxi Zhang, “Optical Word Recognition”, CS 74/174 - Spring 2012, final report of term project

[3] Shahab, Asif; Shafait, Faisal; Dengel, Andreas; , "ICDAR 2011 Robust Reading Competition Challenge 2: Reading Text in Scene Images," International Conference on Document Analysis and Recognition (ICDAR), 2011 , pp.1491-1496, 18-21 Sept. 2011

[4] ICDAR 2003 Competitions website, http://algoval.essex.ac.uk/icdar/

[5] ICDAR 2005 Competitions website, http://algoval.essex.ac.uk:8080/icdar2005/

[6] ICDAR 2011 Competitions website, http://www.cvc.uab.es/icdar2011competition/

[7] Letter Recognition Dataset, visited on 1/22/2013, http://archive.ics.uci.edu/ml/datasets/Letter+Recognition.

Last updated on 1/22/2013